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Abstract: This paper introduces an iterative tomogravity algorithm for the 
estimation of a network traffic matrix based on one snapshot observation of 
the link loads in the network. The proposed method does not require complete 
observation of the total load on individual edge links or proper tuning of a 
penalty parameter as existing methods do. Numerical results are presented 
to demonstrate that the iterative tomogravity method controls the estimation 
error well when the link data is fully observed and produces robust results 
with moderate amount of missing link data. 



1. Introduction 

This paper concerns the estimation of network traffic based on link data. The traf- 
fic matrix of a network, which gives the amount of source-to-destination (SD) flow, 
is an essential element in a wide range of network administration and engineering 
applications. However, in today's fast growing communication networks, it is of- 
ten impractical to directly measure network traffic matrices due to cost, network 
protocol and/or administrative constraints, while measurements of the total traffic 
passing through certain individual links are more readily available. Thus, the prob- 
lem of estimating SD traffic based on link data, called network tomography [10( , is 
of great interest to communications service providers. 
In the network tomographic model [Io| 

(1.1) y = Ax, 

where y is a vector of traffic loads on links, A — (a^ ) is a known routing matrix 
with elements a^- = 1 if link i is in the path for the j-th pair of SD nodes and 
dij = otherwise, and x is the SD traffic flow as a vectorization of the traffic 
matrix. Here the routing protocol A is fixed. In typical network applications, the 
number of links (edges) is of the same order as the number of nodes (vertices) in 
the network graph, while the number of SD pairs is of the order of the square of 
the number of nodes. Thus, dim(y) ^> dim(x) and the network tomographic model 
is ill-posed. Vardi [l(| identified the ill-posedness of (jl.lj) as the main difficulty 
of network tomography and proposed to estimate the expected traffic flow based on 
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independent copies of y by modeling the variance of y. The problem has since being 
considered by many research groups. See Vanderbei and Iannone [9( for MLE/EM, 
Cao et al. |2| for MLE/EM in the model Xj ~ N(\j,<fi\j) and non-stationarity 
issues, Medina et al. Liang and Yu [f| for a more scalable pseudo-likelihood, 
Liang et al. 0] for additional direct observations of flow data for selected SD pairs, 
and Coates et al. 0] and Castro et al. Q for surveys with additional references. In 
general, these methods require observations of multiple copies of y. 

An interesting and noticeable development in the area is the introduction of 
(tomo)gravity algorithms and related methods based on a single snapshot of the 



network, i.e. one copy of y. Zhang et al. [llj observed that in certain communica- 



tions networks (e.g. a backbone network where each node represents a PoP, or point 
of presence), almost all the traffic flow is generated by and destined to a known 
set of edge nodes which do not serve as intermediate nodes in any SD paths. Thus, 
each SD path begins with a source edge node, traverses through an inbound edge 
link, an inner network, and then an outbound edge link to a destination edge node. 
Under this assumption, the total inbound flow ivj m ^ from a source node s is the 
sum of the loads over all the inbound edge links from s and the total outbound flow 
jyjout) ^ Q a destination node d is the sum of the loads over all the outbound edge 
links to d. The edge nodes communicate to each other through an inner network 
with a directed graph composed of inner nodes and links and a routing protocol, 



but the inner nodes does not generate or receive traffic. Moreover, Zhang et al. [11 1 



observed that for each fixed source node s, the distribution of the inbound traffic 
Ns ln ^ from s to different destinations d is approximately proportional to the total 
outbound loads N^ ut ^ these destinations receive. Formally, this is called the gravity 
model and can be written in Vardi's [Io| vectorization as 

7y(in) pr(out) 

(1.2) *,= « N * , N = J2N^=J2^ Ut \ 

s d 

where Sj and dj are respectively the source and destination nodes for the j-th SD 
pair, Xj is the corresponding component of the simple gravity solution x as an 
approximation of the vector x in (jl.ip , and N is the total flow. The gravity model 
is best described as 

(1.3) x sd = N^N { d ouf) /N 

with a slight abuse of notation, where x s d is the traffic flow from source s to des- 
tination d in the gravity model, 1.6. X j — X s j dj ■ 

Here, the relationship between the 
link data y and the SD traffic flow x is still governed by the tomographic model 
(jl.ip . Due to the additional information provided in the gravity model (|1.2|) about 
the nature of the SD traffic x, the number of unknowns in x is square rooted. Thus, 
the ill-posedness of (|l.ip is greatly alleviated. In particular, if all link loads are ob- 
served, the total inbound flow and outbound flow N^ out ' for individual edge 
nodes and thus the total traffic N are all available network statistics in the gravity 
model. In addition to the gravity solution x in (|1.2[) . Zhang et al. [ll[ developed 
the simple tomogravity solution 



(1.4) argmin < | 



: Ax 



and more general tomogravity solutions when the edge nodes are further classified 
as "access" or "peering", while Zhang et al. [l2| developed entropy regularized 
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tomogravity solution as 

(1.5) argmin j||y - Axf + <t>N 2 K (x/JV, x/N) }, 

where K(-,-) is the Kullback-Leibler information and <p is a tuning parameter for 
the penalty level. These tomogravity solutions require complete knowledge of the 
total inbound and outbound flow, i.e. Tvi™' and N^ out \ for all individual source and 
destination nodes. They perform reasonably well when such information is available 
and have been implemented in certain AT&T commercial networks. 

In this paper, we propose an iterative tomogravity (ITG) algorithm which alter- 
nately seeks estimates as local optimal solutions in the tomographic space (jl.ip and 
a gravity space of network traffic flow x. Our algorithm, described in Section 2, is 
based on a single snapshot of the link data and does not require the full knowledge 
of the total inbound and outbound flow for all individual edge nodes. The idea is 
to use the gravity space, instead of the specific simple gravity solution Q1.2p , to reg- 
ularize the network tomography problem (jl.ip . In Section 3, we present the results 
of a real-data experiment to demonstrate that the ITG method is competitive com- 
pared with other tomogravity algorithms when the complete link data is available 
and robust when a moderate amount of link data is missing. 



2. An iterative tomogravity algorithm 



In a general network tomographic model, the observed link data, as a sub-vector 
y* of the vector y in (|1.1[) . satisfies 



(2.1) 



= A* 



where the matrix A* = (a*j) is composed of the rows of the routing matrix A in 
(jl.ip corresponding to the observed links, and x is the SD traffic flow as in (11. ip . 
Let J be the total number of SD-pairs of concern. For the observation y* , the 
tomographic space of probability vectors is 



(2.2) 



T* = jf G H J : y* oc A*f, f > 0, l T f = lj, 



where 1 is the vector composed of l's and v T denotes the transpose of a vector v. 
Here and in the sequel, inequalities are applied to all components of vectors. 

In the literature, different types of flow and load are often specifically denoted. 
Let y( ne t) be the link loads of the inner network, y( e< ^S e ) the loads on the links be- 
tween the edge nodes and inner network, y( sen? ) the load on the links from the edge 
nodes to themselves, x( ne ^ the traffic flow between distinct edge nodes (necessarily 
through the inner network), and x( seu °) the flow of the edge nodes to themselves. 
Since x( sen? ) does not go through the inner network and the flow from an edge node 
to itself is the same as the load on the corresponding self-link, the tomographic 
model can be written as 



(2.3) 



/ y(net) 
.(edge) 



(self) 



/ ,4 (net) o 
A( ed § e ) 

V /( self ), 



x (net) 
x (self) 



Ax, 



with /( seuC ) being the identity matrix giving y( seu °) = x ( se ^ , provided that the inner 
network does not generate traffic. This is a special case of Vardi's [l(| tomographic 
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model describing decompositions of the SD traffic x and link load y, but (jl.ip 
can be also viewed as y( ne ^ = J 4( n et) x ( n et) j n paper, the observed y* in (|2.ip 
is a general sub- vector of the y in (12. 3p to allow partial observation of y( eo -S e ) and 
networks without y( se ^ and x( sen ^ . 

Suppose throughout the sequel that the list of the SD-pairs (sj, dj), i = 1, . . . , J, 
forms a product set composed of all the pairings from a set S of source nodes to a 
set D of destination nodes (D ^ S allowed), so that J = where |C| is the 

size of a set C. This gives a one-to-one mapping between R and the space of all 
\S\ x \D\ matrices: 

V = (ui, . . . ,Vj) T ~ (« s d)|S|x|D|: = V Sj d r 

In this notation, the gravity space of probability vectors is 

(2.4) 5 = {geR J :g~ (g«i)\s\x\D\ = pq T , g > o, i T g = 1}, 

i.e. g s d = p s qd or matrices of rank 1, where p £ ft' 5 ' and q 6 Il'' D '. 

Zhang et al. [11] proposed (|1.2[) as the simple gravity algorithm and (|1.4|) as 
the simple tomogravity algorithm. Zhang et al. [l2| proposed (|1.5|) as the entropy- 
regularized tomogravity algorithm. Their basic ideas can be summarized as follows: 
(i) The gravity model gives a rough approximation of the SD flow; (ii) When the 
simple gravity solution (|1.2j) is available, it can be used to regularize Vardi's tomo- 
graphic model (|1.1|) . Motivated by their work, we propose the following algorithm 
which provides estimates of the SD flow x in (|2.1|) . 

Iterative tomogravity algorithm (ITG): 



(2.5) Initialization: g = 1/J 

(2.6) Iteration: f( new) = argmin |if(f , g( old) ) : f e T*| 

(2.7) g( new ) = argmin {^(f( new ),g) : g G g) 

(2.8) Finalization: Jv = (l T y*) / {l T A*f { fin ^) 

(2.9) x = 7Vf( fin) 

where K(i , g) is the Kullback-Leibler information defined as 

J f 

(2-10) K(f,g)=^/ J logJ. 



As mentioned earlier, our basic idea is to use the gravity space (|2.4[) , instead of 
the simple gravity solution (|1.2p , to regularize the tomographic model (|2.2p . A main 
advantage of this approach is that it does not require the knowledge of the simple 
gravity solution or equivalently, the complete observation of loads on all edge links. 
Numerical results in Section 3 demonstrate that when the complete link data y 
is observed, the ITG (|2.9|) and the entropy-regularized tomogravity (|1.5p perform 
comparably in terms of estimation error, and they both outperform the simple 
gravity (II. 2p and tomogravity (|1.4p . Moreover, the ITG without using the knowledge 
of the "access" or "peering" status of links has similar performance compared with 
the generalized tomogravity method (llj which requires such knowledge. We note 
that the ITG method does not need a tuning parameter as (|1.5|) does. 
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A main difference between ITG (|2 .9[) and the simple tomogravity (|1.4p is that the 
simple gravity solution x in (|1.2p is not explicitly used in ITG, since g is treated 
as an unknown in the ITG algorithm. However, the information in the observed 
portions of y( ec te e ) and y( se ^) is still utilized in the ITG iterations through the 
tomographic space (|2.2p . instead of directly computing x from y( ec te e ) and y( sen? ) 
as in ()1.3|) . If the simple gravity x (or an approximation of it if x is not fully 
available) is used as the initialization for ITG, the simple tomogravity solution is 
the result of a single ITG iteration. We may also treat g = x/N as an unknown in 
(II. 5p . cf. Section 4, but then a tuning parameter is still required. 

We use the relaxation algorithm of Krupp (1979) to compute (|2.6p of the ITG, 
while (|2.7p is explicit with 



(new) _ \ ~* .(new) \ ^ ..(new) 

9sd ~ 2-~i * sd> 2-^i *s'd 



as in ()1.3p . Here is a full description of the relaxation algorithm. Let g 



(old) 



(<7i°^, • • • ><?j°^) T be a given probability vector. The problem is to minimize 
-ftT(f , g' ^)) under the linear constraints in (|2.2p . Since y* — implies /y = 
for all j with a*j = 1 and thus reduces the optimization problem to a subset of j, 
we assume y* = (y*, ...,?/*) > where r is the total number of links with observed 
load. Define 

h .. = [ a tilvt -a*j/Vr> i= l,...,r-l, 



1, i = r. 



The linear constraints A*f = y* and l T f = 1 for the tomographic space (|2.2p can 
be written as Hf — (0 T ,1) T , where H = (hij). Krupp's [5( relaxation algorithm 
maximizes 

(2.11) v r - ^2 ffj° ld) ex P \ hi i Vi _ 1 [ 

j=l I i=i J 

over all vectors v = (v\, . . . , v r ) T and then set 

(2.12) /r^f'e^lEVi-j 

As (|2.1ip is concave in v, its optimization is done by the Newton-Raphson method 
for individual components Vi, cycling through i = 1, . . . ,r. Since h r j — 1 for all j, 
f (new) j n (|2,i2p is properly normalized. 

The iteration steps (|2.6|) and (|2.7|) are both monotone in K(i , g), so that the ITG 
algorithm reaches a local minimum of the Kullback-Leibler information between the 
tomographic (|2.2p and gravity (|2.4p spaces. However, since K (f , g) is not convex 
jointly in (f,g) with g in the gravity space, ITG is not guaranteed to converge to 
a global minimum. 



3. An example 



We conduct numerical experiments with data collected over the Abilene Network 
(an Internet2 high-performance backbone network in United States) illustrated in 




Fig 1. Abline Network. 



Figure [TJ with 12 nodes, 144 total traffic pairs (132 SD pairs and 12 self pairs), 
30 inner links, and 24 edge links. We collect the full 12 x 12 SD traffic matrices in 
5 min intervals for consecutive 19 weeks in 2004. We randomly pick four different 
periods of 3 days and use the data in these four time periods. We call these four raw 
datasets as XI, X2, X3, and X4. It turns out that the four datasets give different 
traffic patterns as the time periods cover different days of the week, cf. Figure O 
For each dataset and each hour, we compute x as the hourly total SD flow and 
y = Ax with a fixed routing matrix A used in the Abilene data. 

We compare four procedures using the complete data y as y*: the ITG (|2.9p . the 
simple tomogravity (STG) in (1.4), the generalized tomogravity (GTG) of Zhang et 



al. [11( utilizing the extra information of "access" or "peering" status of links, and 
the entropy regularized tomogravity (ERTG) in (|1.5|) . Since the traffic flow for self 
pairs (s = d) is directly observable as the load on the self links, the ITG and STG 
estimate these components of x without error. Thus, we measure the performance 
of all estimators by the relative total error for non-self SD pairs 

(3.1) \x sd - x sd \/ y^ y x sd , 

where x sd is the flow from source s to destination d. We compute the relative total 
error for (|1.5|) with various values of the tuning parameter <fi and found that the 



Table 1 

Average of relative total errors for 288 different hours (4 different 3-day periods) based on 
complete link data. The best tuning parameter is used for the ERTG, while extra 
information is used for the GTG. 



Method 


risk 


Iterative Tomogravity (ITG) 


0.3001 


Entropy Regularized (ERTG) 


0.2995 


Simple Tomogravity (STG) 


0.3139 


Generalized Tomogravity (GTG) 


0.3026 



J Fang, Y. Vardi and C.-H. Zhang 



2.4 
2.2 
2 
1.8 
1.6 
1.4 
1.2 



x10' v 



dataset X1 



dataset X2 




20 40 60 80 




80 



2 
1.8 
1.6 
1.4 
1.2 

1 

0.8 



x10" 



dataset X3 



dataset X4 






20 40 60 80 



Fig 2. The total hourly traffic for the 4 non-overlapping 3 day periods. 
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Fig 3. Compare of the error rate using different models, dataset XI. 
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Fig 4. Compare of the error rate using different models, dataset X2. 




Fig 5. Compare of the error rate using different models, dataset X3. 
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Fig 6. Compare of the error rate using different models, dataset X4- 



performance of (|1.5p is near the best in a wide neighborhood of = 1CP 3 = 0.001. 
This confirms the results of Zhang et al. ["l2^ . Thus, <f> — 10~ 3 = 0.001 is used for 
(jl .5|) in our experiment. We plot the relative total error (|3 . 1 [) against hour for the 
four datasets in Figures [3J [4] [5J and [6] We tabulate the average relative error in 
Table 1. From the results of the experiments, we observed that the performance of 
the proposed ITG is comparable to the ERTG with the best choice of the tuning 
parameter and the GTG based on extra information, while all three outperform the 
STG. 

We also exam the relative errors for different SD pairs as functions of the total 
traffic flow for the SD pairs. We compute the relative total error over 3-day time 
periods 




for fixed SD pairs in individual datasets, where t indicates time points with t* = 72. 
We group the values of l|3.2[) for SD pairs in all datasets according to the total flow 

Etli^d with the S rid {0,1/4,1/2,3/4,1,1.5,2,2.5,3,4,5,7} in the unit of 10 10 
packets, and tabulate in Table 2 the average of (|3.2p within groups. From Table 
2, we observe that the estimation error is essentially a decreasing function of the 
amount of traffic for individual SD pairs. 

Finally, we check the robustness of the ITG (|2.9p with missing link data (i.e. y* 
is a proper sub- vector of y). We focus on the case of missing data in edge links as 
the ITG is the only procedure among the four that do not require observations for 
all edge links. Let k be the number of edge links with missing data. We use only 
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Table 2 

Relative total errors over 72 hours for fixed SD pairs and 3-day periods, grouped according to 
the total flow. The relative total errors are decreasing functions of the flow for all 4 procedures. 



Flow 


# in 


ITG 


ERTG 


STG 


GTG 


Level 


Group 










- 0.25 


215 


4.4799 


5.8725 


4.5833 


5.3545 


0.25 - 0.5 


100 


0.4320 


0.4279 


0.4548 


0.4158 


0.5 - 0.75 


73 


0.3457 


0.3449 


0.3666 


0.3467 


0.75 - 1 


30 


0.2997 


0.2992 


0.3379 


0.2505 


1 - 1.5 


16 


0.2286 


0.2305 


0.2402 


0.2588 


1.5 - 2 


25 


0.2878 


0.2859 


0.2934 


0.3089 


2 - 2.5 


18 


0.1836 


0.1828 


0.1802 


0.2080 


2.5 - 3 


6 


0.1583 


0.1576 


0.1689 


0.1207 


3-4 


7 


0.1143 


0.1126 


0.1335 


0.1261 


4-5 


6 


0.1456 


0.1448 


0.1514 


0.1373 


5-7 


2 


0.0887 


0.0938 


0.0767 


0.0882 



data for the first day in dataset XI and compute the average of the relative total 
error for 10 random missing patterns for each given k. We plot this average against 
k in Figure [7] From Figure El we find that the performance of the ITG method is 
robust against small or moderate amount of missing link data (up to 5 out of 24 
edge links). 



4. Discussion 



We consider the estimation of SD traffic flow in a network based on observations of 
a snapshot of traffic loads on links. Based on the ideas of Vardi [10( and Zhang et 
al. 11, IH, we propose an iterative tomogravity method which allows incomplete 



observation of the link data. Our main idea is to use the gr avity space (|2.4j) . instead 
of the simple gravity solution (|1.2jl . to regularize Vardi's [l0( tomographic model 
(II. 1[) . A numerical study with a real- life dataset demonstrates that the proposed 
method has similar performance compared with the methods proposed in (III . [l2T ] 
which demand complete observation of the link data. We discuss below a number 
of related issues. 

There are two other possible ways of using the gravity space (|2.4|) to regularize 
(|l.ip that we do not explore in this paper. The first is to use the ITG (I2.9[) instead 
of the simple gravity (|1.2p in the penalty function in (|1.5p . resulting in 

(4.1) argmin j||y* - A*x[[ 2 + </>A 2 A:(x/A, g (fin VA) } 

with the N in (|2.8p . The second is to alternate between the optimization in the 
gravity space and entropy-regularized solution, i.e. to replace (|2.6p with 



(4.2) A< new ) = (lV)/(l T A*g (old) ) 

f (new) = argmin j|| y * _ tf(new) A . f ||a 

(4.3) +0{iV( new )} 2 ^(f,g(° ld )) : f G T*}. 

A small numerical study seems to indicate that there is little difference between 
(TO) and the ITG. 

The proposed ITG (|2.9p implicitly assumes that the measurement error in the 
tomographic model (|2.ip is of smaller order than the bias representing the Kullback- 
Leibler distance K(x/N,G*) between x/AT and the gravity space (|2.4p . This seems 
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Fig 7. Relative total errors of the ITG versus the number of edge links with missing data. Average 
over 10 random missing patterns is used for each point in the plot. The ITG is robust against 
small or moderate amount of missing link data. 



to be the case in our real-data experiments since ITG significantly improves upon 
the simple tomogravity (|1.4p by formally reducing K(x/N,5t/N) to K(x./N,Q*). 
In cases where the measurement error in the tomographic model is potentially of 
larger order than K(x./N,Q*) [or K(x/N,Sc/N)] it would make sense to replace 
(|2.6[) by (|4.2[) and (|4.3|) in ITG [or to use (|1.5[) ] with a proper tuning parameter <j>. 
A possibility to further reduce the bias is to consider the mixed gravity model 

(4.4) ^niix=| f :f = i>fW f (fe) eej. 

For example, we may compute a regularized mixed tomogravity solution 



(4.5) argmin< 



fc=i 



k* " 

iV 2 V^(f (fe) ,g (fc) ) 



k=l 



by alternately optimizing over g( fe ) G Q, i^ k \ k = 1, . . . , k* and the mixing vector 

(7Tl,...,7r fc .) T . 

It seems that for a network with a fixed routing protocol, the ITG estimate x 
in (|2.9|) is a continuous map of y* , so that x — Ex is asymptotically normal when 
y* — EA*x is asymptotically normal with Etc/N 6 Q, as N — * 00. Our simulation 
study in a small artificial network has demonstrated the validity of this asymptotic 
normality theorem for moderate sample sizes. 

Estimation of traffic matrix based on link-load data alone is difficult as the 
estimation error is typically above 20%. More accurate results can be obtained if 
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additional information can be extracted from packets passing through routers. See 
for example Zhao, Kumar, Wang and Xu [13|. 
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